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Abstract 

We present and analyze a simple and general scheme to build a churn (fault)-tolerant structured 
Peer-to-Peer (P2P) network. Our scheme shows how to "convert" a static network into a dynamic dis- 
tributed hash table(DHT)-based P2P network such that all the good properties of the static network are 
guaranteed with high probability (w.h.p). Applying our scheme to a cube-connected cycles network, 
for example, yields a 0(log TV) degree connected network, in which every search succeeds in 0(log N) 
hops w.h.p., using 0(log N) messages, where N is the expected stable network size. Our scheme has an 
constant storage overhead (the number of nodes responsible for servicing a data item) and an 0(log N) 
overhead (messages and time) per insertion and essentially no overhead for deletions. All these bounds 
are essentially optimal. While DHT schemes with similar guarantees are already known in the litera- 
ture, this work is new in the following aspects: (1) It presents a rigorous mathematical analysis of the 
scheme under a general stochastic model of churn and shows the above guarantees; (2) The theoretical 
analysis is complemented by a simulation-based analysis that validates the asymptotic bounds even in 
moderately sized networks and also studies performance under changing stable network size; (3) The 
presented scheme seems especially suitable for maintaining dynamic structures under churn efficiently. 
In particular, we show that a spanning tree of low diameter can be efficiently maintained in constant time 
and logarithmic number of messages per insertion or deletion w.h.p. 
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1 Introduction 

Peer-to-Peer (P2P) networks are highly dynamic: peers enter and leave the network and connections may be 
added or deleted at any time and thus the topology changes very dynamically. The independent arrival and 
departure by a large number of peers creates a collective effort that is called as churn. Measurement studies 
of real-world P2P networks |[25l l26l [29ll show that the churn rate is quite high: nearly 50% of peers in 
real-world networks can be replaced within an hour. However, despite a large churn rate, these studies show 
that the total number of peers in the network is relatively stable. The study by Stutzbach and Rejaie |29ll 
also indicate that P2P networks exhibit a high degree of variance in terms of the session time (the amount 
of time spent by a node in the network in a session). They show that the distribution of session times appear 
to follow a Weibull or lognormal distribution. 

P2P systems must have efficient and reliable routing in the presence of a dynamically changing network. 
The P2P overlay must exhibit good topological properties (e.g., connectivity, low diameter, low degree, etc.) 
even if the composition of the underlying physical network exhibits significant change. Because the system 
dynamics of these networks are also highly asymmetric with only a small number of peers persistent over 
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significant time periods, providing churn-tolerance in the presence of mostly short-lived peers is essential 

A Distributed Hash Table (DHT) scheme (e.g., ll2Tll28l[T5l ) creates a fully decentralized index that maps 
data items to peers and allows a peer to search for a data item very efficiently (typically logarithmically in 
the size of the network) without flooding. Such systems have been called structured P2P networks, unlike 
Gnutella, for example, which is unstructured. In unstructured networks, there is no relation between the 
file identifier and the peer where it resides. Structured networks are more difficult to implement than an 
unstructured networks, mainly due to the fact that it is not easy to maintain a DHT in a highly dynamic 
setting. In addition to this, since data is stored typically in some arbitrary node, fault-tolerance to node 
deletion is essential. These are some of the reasons why structured P2P networks, despite their efficient 
search mechanism, have been somewhat less successful than unstructured networks when it comes to prac- 
tical deployment. Hence, it is of both theoretical and practical interest to develop simple and efficient DHT 
schemes that work well under high churn rates. 

In this paper, we present and analyze a simple and general scheme to build a churn-tolerant structured 
P2P network. The basic idea behind our scheme is simple. It is easy to design a static topology with desirable 
properties such as connectivity, low degree, low diameter, and an efficient (and local) routing algorithm. 
Indeed such topologies, e.g., hypercube, butterfly, cube connected cycles, de Bruijn graphs, etc., have been 
studied extensively in parallel and distributed computing literature. Our scheme shows how to "convert" 
a static graph into a fault-tolerant DHT network such that all the good properties of the static graph are 
guaranteed with high probability. For example, by applying our scheme to a cube-connected cycles (CCC) 
graph yields a 0(log N) degree P2P network that has a 0(log N) latency (i.e., search time), using 0(log N) 
messages, with constant storage overhead. Here N is the expected stable network size (cf. Section |3?TT) . Our 
bounds are essentially optimal since in our model (cf. Section [37Tb if we want all nodes to access all data 
items with high probability, then it is necessary that the degree be f2(log N); otherwise there will be a non- 
negligible probability that there would be nodes disconnected from the system. In a dynamic network, there 
is the additional challenge of quantifying the work done by the algorithm to maintain the desired properties. 
An important advantage of the above protocol is that it takes 0(log N) overhead (messages and time) per 
insertion and no overhead for deletions. This is optimal since, Liben-Nowell et al. lfl4ll show that fi(log N) 
work is required to maintain (even) connectivity in this stochastic model. 

Our scheme is an improvement in the degree size and message complexity over the network of Saia et 
al. |[24l . The structured P2P network described by Saia et al. ll24ll has a 0(log N) latency for search, using 
0(log 2 N) messages, and 0(log 3 N) degree. Their fault-tolerant overlay is a butterfly-based expander 
topology. Their work guarantees that a large number of data items are available even if a large fraction of 
arbitrary peers are deleted (hence their scheme can tolerate even adversarial deletions unlike ours), under 
the assumption that, at any time, the number of peers deleted by an adversary must be smaller than the 
number of peers joining. In contrast, our scheme constructs a 0(log N) latency and (3(log N) degree P2P 
network that guarantees that every search succeeds with high probability (whpXjat any time, rather than just 
a large fraction, under a natural and general stochastic model — the M/G/oo model |[22l . In a M/G/oo 
model the holding (session) times of nodes can have an arbitrary distribution, while arrival of nodes is 
assumed to be Poisson. (Real-world P2P network measurement studies |[25l [29|| have shown that this is a 
reasonable statistical model.) The construction of our overlay is also much simpler compared to 0124]]. Our 
scheme also improves significantly on the Warp scheme [10]. Warp guarantees 0(log N) search time, but 
has a degree of 0(log 3 N). Our scheme has low maintenance overhead. In particular, node deletions does 
not incur any overhead. Multiple nodes can join and leave at the same time (in particular, up to a constant 
fraction of the total can leave and join at the same time) without any need to change the protocol, and hence 
our protocol can operate in a highly dynamic setting. 

In a P2P network it is important to design distributed dynamic algorithms that maintain fundamental 
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communication primitives such as spanning trees, spanners etc. For example, maintaining a breadth-first 
search tree is useful for efficient broadcasting, aggregation, and routing. Designing efficient distributed 
dynamic algorithms is challenging and only few results are known, see e.g., the work of @ that gives a 
distributed dynamic algorithm for maintaining a spanner. It is non-trivial to efficiently maintain even some 
spanning tree dynamically — the trivial method would be to recompute a spanning tree (e.g., by breadth-first 
search EUlO every time the network changes. However, this will take @(D) time and ©(j-E|) messages [20]. 
In contrast, we show how a spanning tree of diameter O(D) (where D is the diameter of the underlying 
graph) can be maintained by our scheme in 0(log N) messages and O(l) time per insertion or deletion. It 
is not clear how one can efficiently maintain a breadth-first spanning tree or some low-diameter spanning 
tree in many previous schemes e.g., Chord 0. 

Other Related Work. The literature on DHT schemes is huge and we confine ourselves to those that appear 
relevant to our work. The idea of a general scheme for mapping a static network into a dynamic one has 
appeared before see e.g., fT8l [Tl \Wfr. The work of ll27l uses a CCC graph (this is also the graph used to 
illustrate our general scheme in this paper) to build a structured P2P network. However the above papers 
do not present a rigorous analysis of the performance under a realistic stochastic model. Furthermore, to 
the best of our knowledge, none of the previous works, address the problem of efficient maintenance of 
spanning substructures under churn. 

There has been other works on building fault-tolerant DHTs under different deletion models — adver- 
sarial deletions and stochastic deletions. For example, the work of lfl2l deals with adversarial churn and 
gives techniques to handle worst-case joins and leaves. Fiat and Saia [6] proposed a DHT network that 
is robust against adversarial deletions (i.e., an adversary can choose which nodes to fail). In this model 
some small fraction of the non-failed nodes would be denied from accessing some of the data items. While 
this solution is more general than our model it has some disadvantages: (1) It is not clear whether the sys- 
tem can guarantee its bounds when nodes leave and join dynamically; (2) the message complexity is large 

— 0(log 3 N) and so is the network degree. Moreover their construction is very complicated which can 
increase the likelihood of error in implementation and decrease the possibility of practical use. In a sub- 
sequent paper Saia et al. ll24l address the first problem and give a scheme with 0(log N) time for search, 
using 0(log 2 N) messages, and 0(log 3 N) degree. Datar [4] gives a scheme based on the multibutterfly 
network that improves on the scheme of Fiat and Saia [6] under the adversarial deletion model. Naor and 
Weider (T7\ describe a simple DHT scheme that is robust under the following simple random deletion model 

— each node can fail independently with probability p. They show that their scheme can guarantee loga- 
rithmic degree, search time, and message complexity if p is sufficiently small. In contrast, our scheme is 
simpler than lfl7l and works under a more realistic stochastic deletion model (even a large constant fraction 
of nodes can get deleted in our model) and guarantees the same (essentially optimal) performance bounds. 
Also our scheme requires no maintenance overhead under deletions unlike the scheme of ifTTTl . Hildrum 
and Kubiatowicz O describe how to modify two popular DHTs, Pastry |[23l and Tapestry ||3H to tolerate 
random deletions. Finally, we point out that several DHT schemes (e.g., ll28ll2Tl[TT1l ') have been shown to 
be robust under the simple random deletion model mentioned above. 

2 The Scheme 

We will show how to build a P2P network G = (Vg,Eg) of expected stable size N = \Vg\ (defined 
precisely in Section |3~TT) . Let H = (Vh, Eh) be the static ("template") graph that will be used to build G. 
(We will later show how the network can dynamically be made to adapt to a changing network size. Note 
that stable means that the total network size is more or less remains the same, up to constant factors.) We 
will use the term node to denote a node (peer) of G and the term vertex to denote a vertex of H. 

Although, in principle, any graph can be taken as a template (or "backbone") graph, for the purposes of 
constructing efficient P2P networks it is desirable that H has certain properties such as connectivity, regu- 
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larity, recursive structure, constant degree, logarithmic diameter, and a simple and efficient (local) routing 
scheme. Good candidates for H are hypercube network and its variants (butterfly, Benes network and cube 
connected cycles), de Bruijn graph etc. Henceforth, the following assumptions will be made with respect to 
H: 

1. H has diameter D and maximum degree A. 

2. H has a local and efficient routing scheme 1Z that can route between any two nodes in 0(D) time 
using 0(D) messages, where D is the diameter of H. Specifically it will be required that H has a 
vertex labeling scheme that enables shortest path routing with low memory overhead (see e.g., [8] for 
a survey on such routing schemes). In such a routing scheme, vertex labels are assigned in such a way 
that every vertex v, given the destination address u, can decide locally (based solely on the address 
of u) the outgoing edge of v that (eventually) leads to u by using only a routing table of size at most 
0(A) entries per node. (Each entry of the routing table will specify which outgoing edge to take for 
a given destination u.) A well-known example of such a scheme is the bit-fixing routing scheme in a 
hypercube (and its variants) lfl3l . 

Given the above assumptions, our scheme builds a DHT-based P2P network (with expected stable size 
N) with the following properties: 

• The degree of a node and its routing table size is bounded by 0(A log N) w.h.p. (cf. Theorem 13 .21 ) 

• At any time, the network is connected and has a diameter of 0(D) w.h.p. (cf. Theorem 13.21) 

• Every search will succeed in 0(D) time w.h.p and will use 0(D) messages, (cf. Theorem l3.3l ) 

• The time and message overheads for a node to join the network are 0(D) and 0(D + A log N) 
respectively w.h.p. (cf. Theorem 13 .41 ) 

• Number of nodes responsible for servicing a data item is O(l). 

Throughout, we will illustrate by taking H to be a cube connected cycle ( CCC) network. Our scheme can 
be adapted to other similar types of graphs. The r-dimensional CCC is constructed from the r-dimensional 
hypercube by replacing each vertex of the hypercube with a cycle of r vertices in the CCC. The ith dimension 
edge incident to a vertex of the hypercube is then connected to the ith vertex of the corresponding cycle of 
the CCC [ 13]. In a CCC, the label of a vertex is represented by a pair < w,i > where i is the position of 
the node within its cycle and w is the label of the vertex in the hypercube that corresponds to the cycle. Two 
vertices < w,i > and < w',i' > are linked by an edge in the CCC if and only if either (1) w = w' and 
i — i' = ±1 mod r or (2) i = i' and w differs from w' in precisely the ith bit. Edges of the first type are 
called cycle edges, while edges of the second type are referred to as hypercube edges. A CCC graph of N 
nodes has diameter 0(log N) and each node has degree 3. A CCC has an efficient routing scheme, namely 
the bit-fixing routing scheme lfl3ll that can route in 0(log N) time using O(logiV) messages using only 
routing tables of size 0(log N). In this scheme, to route a message between two vertices with vertex labels 
< u,i > and < v, j >, the bits of u are successively transformed (say, from the first to the last) to match v. 
The message is routed between one dimension to the next using the hypercube edges, while the cycle edges 
are used to bring the message to the vertex of the cycle with the appropriate dimension. 

A node x in G has a label called the node-id which corresponds to a vertex label of H. We will choose 
the size of H to be S = \Vh\ = a \ogN ' wriere a > 2 is a constant (any such a will suffice). (Throughout 
we will assume logarithm to the base 2. We will omit floors and ceilings, assuming that quantity in question 
is rounded to the nearest integer.) Node-ids of vertices of H are assigned randomly by sampling from all 
possible vertex labels of H. Specifically, if H is a CCC, the node-id of a vertex is obtained as follows: toss 
a fair coin (has a equal probability of getting a or 1) r = \og(N/a log 2 N) times independently and obtain 
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a r-bit random bit string a r (r is the dimension of the CCC). Also sample a random number from 1 to r 
and call it i. Then the node-id of the vertex is < oy, i >. We say that the node covers the vertex having the 
label corresponding to its node-id. There is an edge between two nodes with node-ids x and y if there is an 
edge in H between x and y or if x = y. (Thus note that vertices that share the same vertex label will form a 
clique.) We call a vertex in H to be occupied if there is a node in the network (i.e., a live peer) which covers 
this vertex; otherwise we call it to be a hole. 

Joining and Leaving the Network. A node (say v) that wants to join the network chooses its node-id as 
explained above. We assume that N (the expected stable network size) or an estimate of N (a constant 
factor estimate is sufficient) is known to all joining nodes. Because of the numbering scheme, v can locally 
determine the node-ids of its (potential) neighbors without any global knowledge, -u's neighbors in the P2P 
networks are the nodes that cover the above determined node-ids. To join the network, v contacts any one of 
the nodes in the network (such entry points are provided by an external mechanism), v can then make use 
of an efficient routing scheme 1Z of H to find its neighbors (i.e., their IP addresses) and joins by connecting 
to them. If H to be a CCC, 1Z can be the standard bit-fixing routing scheme mentioned earlier. 

A node can leave the network at any time; the node's data is transferred to a randomly chosen node with 
the same vertex label (note that all such nodes are neighbors of the leaving node). We show later that such a 
node will always exist w.h.p in our model. 

Search (Look-up) Scheme. Searches are handled by a DHT scheme, similar to other DHT schemes such as 
Chord The data (or key) is hashed to a random vertex label in the same fashion as was done for choosing 
the node-id of a vertex. Data is inserted to a randomly chosen node having this label as its node-id. Search 
for this data is thus directed to some node (say u) having its node-id equal to the data's hashed value. The 
data will be stored in u or any one of the neighbors of u that share the same vertex-label. Since all nodes 
sharing a node-id are connected to each other (forming a clique), search will succeed even if only one node 
covering this vertex is live in the network (this node will have the data). Search is performed by invoking 
the bit-fixing routing scheme as illustrated below by an example. Suppose a node with node-id x wants to 
search for a data item hashed to a number t ( 1 < t < S). Let the route given by the bit-fixing routing 
scheme from x to t be < x, u\, u%, . . . , t >. Then x will send a message to a neighbor node which covers 
u\ which in turn will forward to its neighbor node which covers U2 and so on. 

3 Analysis 

We analyze various network parameters - routing table size (i.e., degree), connectivity and diameter, main- 
tenance overhead for joins, and the complexity for doing search. We first describe the stochastic model used 
in our analysis. 

3.1 Model 

In evaluating the performance of our protocol we focus on the long term behavior of the system in which 
nodes arrive and depart in an uncoordinated, and unpredictable fashion. We model this setting by a stochastic 
continuous-time process: the arrival of new nodes is modeled by Poisson distribution with rate A, and 
the duration of time a node stays connected to the network is independently determined by an arbitrary 
distribution G with mean 1/fi. This is also called the M/G/oo model in queuing theory. (This is more 
realistic than the less general M/M/oo used in |[T9ll to model P2P networks.) Measurement studies of 
real P2P systems l25l l26l l29l indicate that the above model approximates real-life data reasonably well, 
especially since the holding time distribution is arbitrary (in particular the study in [29] actually indicates 
that the holding times may follow Weibull or lognormal distributions). 

Let Gt be the network at time t (Go has no vertices). We are interested in analyzing the evolution in time 
of the stochastic process Q = (Gf)t>o- Since the evolution of Q depends only A//i we can assume w.l.o.g. 
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that A = 1. To demonstrate the relation between these parameters and the network size, we use N = X/fj, 
throughout the analysis. We justify this notation by showing that the number of nodes in the network rapidly 
converges to N which we call the expected stable network size (or simply, stable network size). We use the 
notation Gt = (Vt, E t ) be the network at time t. 

Missing proofs of Theorems and Lemmas can be found in the Appendix. It also mentions the Chernoff 
bounds that we use in the proofs. 

3.2 Network Size 

The following theorem characterizes the network size and is a consequence of the fact that the number 
of nodes at any time t is a Poisson distribution (this is true even if the holding times follow an arbitrary 
distribution) l22l pages 18-19]; applying the Chernoff bound for the Poisson distribution gives the high 
probability result. 

Theorem 3.1 (Network Size) Ifjj -)• oo then E[\V t \] = N, and w.h.p. \V t \ = N ± o(N). 

The above theorem assumed that the ratio N = X/fx was fixed during the interval [0, t]. We can derive 
a similar result for the case in which the ratio changes to N' = X'/fJ,' at time r. 

Corollary 3.1 Suppose that the ratio of between arrival and departure rates in the network changed at time 
t from N to N'. Suppose that there were M nodes in the network at time r, then if — > oo w.h.p. Gt has 
N' + o(N') nodes. 

3.3 Network Degree 

Theorem 3.2 [Degree] At any time t such that t/N — > oo, the degree of a node and the routing table size 
is bounded by 0(A log N) w.h.p., where A is the maximum degree of H. (If H is a CCC, then the degree is 
OQogN).) 

3.4 Fault-tolerance and Search 

We show that every query succeeds w.h.p at any time (after a short initial period). We show this by first 
proving that every vertex is occupied w.h.p which ensures that queries that (logically) map to this vertex 
value can be serviced by some live node covering this vertex. This fact along with the way edges are 
constructed in the P2P network will show that a search will succeed w.h.p for every search. 

The following theorem shows there is no hole w.h.p. Recall that we call a vertex as a hole (see Section 
2) means that there is no node (i.e., a live peer) in the network that covers this vertex. 

Lemma 3.1 (Occupancy of Vertices) At any time t, such that t/N — > oo, w.h.p. every vertex of H is 
occupied. 

Proof: When t/N — > oo nodes depart the network according to a Poisson process with rate 1. Also 
from theorem 13.11 the number of nodes in the network is at least N — o(N), with probability at least 
1 — 1/N n ^\ Since, at any time t, every node has the same probability to occupy each of the S vertices of 
H, the probability that a vertex is not covered is at most 
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< e - alog7V ) < l/N 2 

by our choice of 5 (a > 2). 

Applying Boole's inequality (union bound), the probability that no vertex is unoccupied is at most l/N. 

□ 

The following theorem on the success probability of a search query is a consequence of the previous 
theorem and the way nodes link to each other. Note that we assume that one time unit is taken for sending a 
message across an edge (i.e., one hop). 

Theorem 3.3 (Search) For any time t, such that t/N — > oo, w.h.p. any search query will be successful. 
The time (number of hops) needed is 0(D) w.h.p., where D is the diameter of H. 

Proof: Consider a search query emanating at time t from the node with node-id x for a node covering a 
node-id y (the hash value of the data). This search will be successful if there is a path in G to one of the 
nodes covering y. In terms of the template graph H, consider the path from x to y given by the routing 
scheme 1Z of length 0(D). (If H is a CCC, then the 1Z is the bit-fixing scheme and D is 0(log N).) This 
path goes through a sequence of vertices in H. From Lemma I3TT1 it follows (via union bound) that every 
vertex of H is occupied w.h.p for a time interval O(D) (starting from time t.) Thus during this time interval, 
every vertex in G is covered by some (live) node in the network. From our construction of G there is an 
edge between any node covering a vertex to any node covering the neighbor of the vertex. Thus, w.h.p the 
query will take 0(D) time. □ 

The above theorem also implies the following result on the connectivity and diameter of the network. 

Corollary 3.2 (Connectivity and Diameter) For any time t, such that t/N — > oo, the network is connected 
and has a diameter of 0(D) w.h.p. 

Theorem 3.4 (Overhead of Joining) For any time t, such that t /N — > oo, the time and message overheads 
for a node to join the network are respectively 0(D) and 0(D + A log N) w.h.p. 

4 Dynamic Maintenance of Spanning Tree of Low Diameter 

The scheme admits a simple local algorithm to dynamically maintain a spanning tree whose diameter is 
almost optimal, i.e., essentially the same as the underlying template graph, i.e., D. Note that the diameter 
of the P2P network is 0(D). Let Gt be the network at time t. The goal is to compute a spanning tree of Gt, 
denoted by T(Gt) of diameter 0(D) efficiently. 

The P2P network constructed by our scheme admits a very simple and efficient algorithm. Let T(Gt) be 
the spanning tree of diameter 0(D) at some time t, such that t/N — > oo. We will first describe how T(Gt) 
is constructed at some time t and then describe how it is maintained under insertions and deletions at any 
time t > t'. (With a very small probability, one may have to construct the spanning tree from scratch at any 
time t, as discussed below. Thus strictly speaking the complexity bounds hold in an amortized sense.) 

Let S(£) be the set of nodes that share the same-vertex label £. Construct a breadth-first tree Tjj on 
the template graph H. Choose a (distinguished) node u(£) S S(£) — we call u(£) the leader node of the 
set of nodes belonging to S(£). The tree T(Gt) is constructed as follows. Connect the leader nodes of the 
respective vertex labels as they are connected in the breadth-first tree Tg. Make all non-leader nodes of 
S(£) the children of the leader node u(£). Note that non-leader nodes will all be leaves in T(Gt). 

The tree is maintained as follows: 
Insertion: Let a node v is inserted. Let it have vertex label £. Then the node is added as a child of the leader 
node of S(£). (Note that a leader exists w.h.p by Lemma I3TT1 ) The time and message complexity is O(l) 
per insertion. 
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Deletion: Let a node w be deleted. There are two cases. If w is a non-leader node, it is simply removed. 
Note that this does not disconnect the tree as this will be a leaf node. On the other hand, let wbea leader 
node and let the vertex label of w be £. Then w is deleted and in its place another (non-leader) node, say x, 
belonging to S{£) (i.e., nodes that have the same vertex label £) is elected as leader. By our tree construction, 
x will be a leaf child (again such a node will exist w.h.p by Lemma [3TTb . Thus the rest of the tree is not 
affected. Also, by our construction, x will have an edge to to's parent node and all its other children. Thus 
connectivity is preserved and diameter is still 0{D). The message complexity is O(logiV) per deletion 
(since only so many nodes are affected) and the time complexity is 0(1). Note that leader election itself can 
be done in O(l) rounds and 0(log N) message complexity as the set S(£) forms a clique. 

There is a small probability that the above algorithm will fail, e.g., deleting a node, leaves the corre- 
sponding vertex unoccupied (i.e., a hole). In such a case, one has to reconstruct the tree from scratch as 
discussed first. 

5 Handling Change in Stable Network Size 

The performance of our scheme depends on the stability of the network. It is easy to see that our scheme can 
easily tolerate changes up to constant factors (thus, as mentioned earlier, it is enough to have an estimate of 
N up to some constant factor). However, bad events, such as the network size drastically getting reduced, 
possibly even leading to the network getting disconnected, can happen, but with minuscule probability in 
our model. In case such events happen (which will eventually happen with probability 1 if the system runs 
forever) remedial measures can be taken such as resorting to an external mechanism to connect the network 
again (if the network gets disconnected) or rejecting new connections (if the size exceeds very much) till the 
situation self-corrects itself. Our analysis can be extended to handle such situations. 

We now discuss how the scheme can be modified to accommodate gradual changes in stable network 
size. As shown in Corollary 13.11 if the ratio between the arrival and departure rates in the network change, 
then this leads to a new expected stable network size. Suppose the new stable size is one-half of the original 
network size. How can the network adapt to this changed size? Assume that H is a hypercube (similar 
argument will work for CCC and other related networks). All that is required is to reduce S (the size of H) 
by a factor of 2. This can be done easily in a local manner. Each node will simply reduce its dimension 
by 1. This can be accomplished by dropping the last bit in the node-id. The hash values of data are also 
altered in the same way. It is easy to see that because of the recursive nature of construction of the hypercube 
(i.e., a hypercube of dimension r can be constructed from two hypercubes of dimension r — 1), reducing 
the dimension will require only 0(1) overhead per node. To illustrate, consider two nodes with node-ids 
< xi, . . . , x r , > and < x%,..., x r , 1 >. Dropping the last bit, will make both these nodes to cover the 
vertex with label < Xj, . . . , x r >. Data that were originally serviced by either of these will now be serviced 
by both of them. On the other hand, if the stable network size increases by a factor of two, then each node 
will increase its dimension by one, by adding one more random bit to its node-id (cf. Section|2]). To illustrate, 
consider the set of nodes with the same node-id < x\, . . . , x r >. Randomly adding one more bit (last bit), 
will make on the average half of the nodes in this set to cover the vertex with label < x\, . . . ,x r ,0 > and the 
other half to cover < x\, . . . , x r , 1 >. The data that are serviced by these nodes also get hashed to the same 
node-ids. It is not difficult to show that the above transformation preserves all the properties of the scheme, 
namely network degree, number of hops needed for search, fault-tolerance, connectivity, and diameter. 

6 Simulation Results 

In this section, we present a simulation of the scheme to get a better picture of how the network will react in 
practice. The theoretical results proved earlier are asymptotic, i.e., shows that the performance is good when 
4f — > oo. Thus it is also of interest to see performance data measured from simulations for the average case. 

The Simulator is written in Java to mimic a network that runs for some time t with stable network size 
N. The network loops for t cycles, each adding a sequence of new peers, then removing peers who have 
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stayed for their predetermined length. Every so many cycles, the network is inspected to determine varying 
statistics, e.g. diameter, average degree. 

Nodes arrive based on a Poisson distribution with rate A. This is achieved by each cycle sampling a 
Poisson random variable with rate A and adding that many nodes to the graph, serially. Each node's session 
length, 1/ ix, is sampled then from an arbitrary distribution, with mean 1/fx. Based on real world statistics 
in Stutzbach, et al. QUI l29l . Weibull, log-normal, and exponential distributions fit well to mirror actual 
peer session lengths. For most simulations, the session length was taken from random variable with Weibull 
distribution, with shape parameter k = .59 (based on 11301 ) and varying the scale parameter A such that 
the mean of the random variable will be l//x. The Poisson variable rate A and Weibull mean 1/fx are then 
chosen so X/fx = N. 

The simulation keeps track of the basic network statistics: diameter, average degree, as well as those 
of interest to this specific network construction: vertex coverage, i.e. the percentage of vertices in the 
"backbone" network that are covered by network nodes; average coverage, i.e. the average number of nodes 
covering a vertex in the "backbone"; and random path length, i.e. the average path length through the 
network over log(AT) paths. 

6.1 Coverage 

The coverage (i.e., occupancy of the underlying template graph) of the network is important, without 100% 
coverage routing through the network cannot be assured to be done efficiently, and if the coverage becomes 
too low, the network may become disconnected. Coverage is measured as the percentage of vertices in 
the template graph (i.e., CCC) that are covered by a node in the network. Coverage is tied closely to the 
dimension of the CCC graph, in relation to the number of nodes in the network. If the dimension of the 
CCC graph is too large for number of nodes, the network can never reach 100% coverage, as seen in the 
dimension 10 network in Figure 1 (all figures are placed in the Appendix). However, it can also be seen 
that if the dimension fits the number of nodes, the graph will reach 100% coverage quickly. The dimension 
r = [log (AT/ log 2 N)~\, with A" being stable network size, gives 100% coverage once the network reaches 
stable size with every simulation. 

6.2 Diameter 

The diameter d of a CCC graph of dimension n can be computed by d = 2n + \n/2\ — 2 for n > 4 
|[7l . The diameter of the network was computed approximately by traversing the network with breadth-first 
search, and taking the diameter to be twice the height of the produced tree. This provides a reasonable 
estimate, within a constant factor. With networks of a large number of nodes, this becomes to inefficient 
in terms of time, in many simulations, tripling the run time. A faster approach is to consider the random 
path. In the random path, two nodes would be pulled from the network at random, and a path would be 
routed between them, log (AT) paths are sampled, and their lengths averaged together. This allows a much 
faster measurement of the network diameter. As seen in Figure 2, the random path actually provides a 
much more accurate diameter than the breadth-first search; due to the efficient shape of the CCC graph, the 
BFS-diameter is greater by almost a factor of 2. 

6.3 Average Degree and Coverage 

The average degree of a node in the network can be seen in Figure 3 to grow with the network size, keeping 
within a constant factor of log(A^). The sharp drops in the average degree occur when the network size is 
large enough to support a higher dimension CCC graph, spreading the nodes in the network over many more 
vertices in the backbone CCC graph. 
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The average number of nodes covering a vertex is heavily related to the average degree of a node. As 
seen in Figure 4, it follows the same pattern of growth. It is interesting to point out, from networks ranging 
of size 10000 to 150000, the average degree and coverage stay constant, around average degree of 100 and 
average coverage of 25. 

6.4 Handling Changes in Network Size 

The network will not stay at constant size forever and must compensate for drastic changes in network size. 
To accomplish this, the dimension of the network must be increased or decreased to adjust for an increase 
or decrease in overall network size. This should be accomplished in as decentralized process as possible, so 
that each node must work to keep track of the network stability. 

We use the following method to detect changes in network size. Each time unit in the simulator, a node 
in the network runs a simulation method mimicking normal operations of a node. At regular intervals, the 
will look into its neighbors in the network and attempt to ascertain if the current network is stable, then 
take measures if it is not stable. The regular intervals were tested with success at 100 to 500 time units; 
any shorter and the fluctuation of network would interfere too greatly for one individual node to correctly 
calculate the network status. 

A node determines whether it is stable by using the average degree of several nodes, and checking if it 
is close to the ideal stable degree, D. The degree of the nodes stays within 0(log N), and based on previous 
network simulations of nodes in the networks up to 1000000 nodes, the stable degree will fall around 100, 
±50, as seen in Figure 3. Each node tracks the progression of the average degree of a sampling of nodes in 
the network, called A. If A, begins moving away from the stable degree size, the node will then lower its 
dimension if A is falling or increase its dimension if A is rising. There is a buffer of ±65 around D, so that 
random variations in the sample average degree will not trigger incorrect dimension change. 

Each dimension change will only decrease the dimension of the node by 1. This is to prevent the 
network from growing or shrinking too quickly. If the nodes of the network dimensions would make large 
increases in dimension, the nodes would need to expand to too large a CCC, increasing the time the network 
is disconnected. Decreasing the dimension greatly would cause the cycles of the CCC to be shortened too 
much, causing difficulties in network routing. 

If each node was left to change by themselves, many nodes would not get a chance to change or change 
too slowly and leave the network unstable or disconnected. To remedy this, once a node detects a network 
instability and changes dimension, it sends a message to each of its neighbors, suggesting for them to 
change their dimension to its new dimension. Each node monitors these messages, and once it receives 
enough of them (simulations have shown that around 5 is sufficient to eliminate any false positives), it will 
change its dimension to the suggested dimension, regardless of their own measure of the average degree. 
As each changes, it sends its own suggestion messages, which will then effectively propagate the change in 
dimension across the network. 

As the network is undergoing change, nodes are still joining it, so their dimension is decided by rounding 
the average dimension of all nodes that share its vertex and node id. Since all nodes that share a vertex have 
the same degree, once change to the dimension comes to a vertex, they will all change very quickly, so the 
new node will either have the new correct dimension or be in a vertex that the change has not propagated to 
yet. 

Figure 5 depicts a typical network response to a large change in network size. The network was simulated 
for 450000 time units, where a 50000 node network dropped to a 20000 node network. The network is 
forgiving in small decreases, but once the network drops too far at t = 610, the network corrects itself 
quickly, 70% of the nodes in the network switching in less than 7000 time units. Due to nodes continually 
being added while the network is adjusting, perfect instantaneous convergance to the new dimension is 
unlikely, but as the network progresses, it will continue to self-adjust and reduce the average dimension in 
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all of its nodes to the new correct dimension. The random path statistic in Figure 5 shows the average of 
a sampling of nodes route lengths when trying to reach a random assortment of nodes, mimicking requests 
during normal network operations. While the network contains nodes of differing degrees, it is still able to 
function normally, with few disconnects or routing problems. 

7 Concluding Remarks 

We presented a simple and general scheme for building a structured P2P network. We analyze our scheme 
under a realistic churn model and provably show that it gives essentially optimal bounds with respect to 
search time, degree, message complexity and maintenance overhead. The scheme offers algorithmic benefits 
to efficient distributed dynamic maintenance of spanning trees. It will be interesting to explore dynamic 
algorithms for other problems in this scheme. We also did a simulation based-study the understand the 
average performance of the scheme in networks of moderate size. 
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Figure 5: Dimension Adjustment 



B Proofs 



Throughout our analysis we use the Chernoff bounds for the binomial and the Poisson distributions. Let the 
random variable X denote the sum of n independent and identically distributed Bernoulli random variables 
each having a probability p of success. Then, X is binomially distributed with \i = np. We have the 
following Chernoff bounds [2]: For < 6 < 1: Pr(X > (1 + 5)fi) < e ~^ 2 /3 and Pr(X < (1 - 5)fi) < 
e -fi8 /2 ^ ave id en ti ca i bounds even when X is a Poisson random variable with parameter \i 



B. 1 Proof of Theorem I3H 

Proof: Consider a node that arrived at time x < t. The probability that the node is still in the network at 
time t is 1 — G(t — x). Let p(t) be the probability that a random node that arrives during the interval [0, t] is 
still in the network at time t, then (since in a Poisson process the arrival time of a random element is uniform 

in[0,t]), 

pit) = \ I (1 - G(t - x))dr = - [ (1 - G{x))dr. 
1 Jo 1 Jo 

Our process is similar to an infinite server Poisson queue. Thus, the number of nodes in the graph at 

time t has a Poisson distribution with expectation tp(t) (see |[22l pages 18-19]). 

When t/N -> oo, E[\V t \] = N. 

We can now use a tail bound for the Poisson distribution (2l page 239] to show that 

Pr (\\V t \ - E[\V t \}\ < V^Vlogiv) > 1 - l/N c 
for some constants b and c > 1. □ 



B.2 Proof of Theorem 1531 

Proof: We first show that the number of node-ids covering a given vertex is 0(logiV) w.h.p. When 
the network is stable, the number of live nodes is at least N — o(N) w.h.p, i.e., with probability at least 
1 — l/iV^ 1 ). Let Yj be the indicator random variable for the event that some node j covers a given vertex 
v. Then 

Pr(Yj = 1) > (1 - (1 - l/S)) N - o( - N \l - l/iY n «) 
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where S = Q(N/ log N), is the size of H. Thus, by linearity of expectation, the expected number of nodes 
covering a given vertex is 

{N±o{N)){\ - (1 - 1/S)) N -° (N) {1 - 1/N n ^) = G(logiV). 

Applying the Chernoff bound gives the high probability result. 

Since the maximum degree of a vertex is A the number of edges incident on a node is ©(A log N) since 
each node has edges to nodes that cover a constant number of vertices and each vertex covered by ©(log N) 
nodes w.h.p. The bound on the routing table size follows from the fact that H admits an routing scheme TZ 
that has a routing table size of 0(A) entries per node. □ 

B.3 Proof of Theorem 1541 

Proof: An incoming node has to locate a node in the network with the same node-id; then it can find all 
of its neighbors in 0(1) time and 0(log N) messages. Finding such a node (starting from some entry point 
node) takes O(D) time (Theorem [33]). Hence the total time needed to find all neighbors is O(D). The total 
number of messages needed is 0(D + A log N) w.h.p., since D messages are needed for routing (to find a 
node of same id) and a routing table updates of size 0(A log N) has to be done in total (for the new node 
as well as the neighbors of the new node). □ 
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